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Abstract: 

This paper intends to construct a survey data quality strategy for institutional 
researchers in higher education in light of total survey error theory. It starts with 
describing the characteristics of institutional research and identifying the gaps in 
literature regarding survey data quality issues in institutional research. Then it is followed 
by introducing the quality perspective of a survey process and the major components of 
total survey error. A proposed strategy for inspecting survey data quality is presented on 
the basis of five types of survey error and the characteristics of institutional research 
survey projects. The strategy consists of quality measures for each type of survey error, 
and quality control and quality assurance procedures for each of the quality measures. 
The major components in the survey data quality strategy are summarized in Table 2 in 
this paper, with elaborations in related sections. The paper ends with a discussion about 
the implications of the strategy for institutional researchers. A checklist for inspecting 
survey data quality is attached as an appendix. 
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Since the 1960s, institutional research (IR) has emerged as a profession and gradually 
become an organizational function of higher education institutions in North America. Its 
presence in higher education is a response to changing demands of society for its 
institutions, such as calls for increased accountability and efficiency (McLaughlin & 
Howard, 2001). Initially, IR appeared as a decentralized set of activities conducted in 
various offices throughout university/college campuses; however, its function has 
become centralized in institutional research offices. 

Institutional research helps answer three questions essential to the sustained 
development of an organization: Where is the organization at this moment? Where is the 
organization going? And how can the organization best arrive at its desired end? 
(Middaugh, Trusheim, & Bauer, 1994, p.l) Institutional research fosters organizational 
learning (Leimer, 2009). Institutional research has increasingly becomes a core 
administrative function through its having become integrated into strategic planning and 
assessment process of the institution (Morest, 2009). It is anticipated that institutional 
researchers will not only continue to fulfill their core function by converting data into 
information but will also become change agents by actively engaging in the process of 
managing and leading institutional change (Swing, 2009). 
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What is institutional research about? In his seminal monograph on institutional 
research, Saupe (1990) defines institutional research as “research conducted within an 
institution of higher education to provide information which supports institutional 
planning, policy formation and decision making” (p. 1). As such, the main product of 
institutional research is information. There are multiple institutional research measures 
including those related to organizational inputs (e.g., students, faculty and staff, facilities, 
and revenues), processes (e.g., academic programs, program completion, quality, 
productivity, and strategic planning), outputs (e.g., graduates, student outcomes) and the 
external environment (e.g., financial considerations, employment market, government 
concerns, and regional accreditation) (Middaugh, Trusheim, & Bauer, 1994). The task of 
institutional researchers involves converting complex data to actionable information 
(Anderson, Milner, & Foley, 2008) and concerns three interdependent forms of 
“organizational intelligence” (Fincher, 1985): technical/analytical intelligence, issues 
intelligence, and contextual intelligence (Terenzini, 1993). Institutional research draws 
from various methodologies including applied research, program evaluation, policy 
analysis and action research depending on questions it needs to address (Saupe, 1990). 

Institutional research distinguishes itself from higher education research in that 
institutional researchers are more concerned about knowledge about a specific institution 
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or system of institutions whereas higher education researchers are more interested in 
advancement of theory and practice in higher education in general (Saupe, 1990; 
Terenzini, 1993). Another difference is the importance of the utilization or application of 
the information produced through institutional research. The use of information is 
viewed as an essential component in the “life cycle” of an institutional research activity. 
Along with other stages of design, collection, preparation, analysis, and dissemination, 
application of the research results and feedback received are the “primary determinant[s] 
of success” (Borden, Massa, & Milam, 2001, p. 200). 

Institutional research involves three data sources: institutional information systems or 
administrative data (e.g., student enrolment data, faculty data), institutional surveys or in- 
house surveys, and external data sources (regional or national data, e.g., Integrated 
Postsecondary Education Data System in the United States.) (Borden et ah, 2001). 
Although a locally prepared survey may be the best option for obtaining the needed 
information, information users in higher education institutions generally have higher 
levels of trust toward research derived from administrative data than that derived from 
survey data. A similar pattern is also found among higher education researchers and 
policy makers, who traditionally have less confidence in softer and more subjective 
measures such as survey data (Gonyea, 2005). At the same time, more than ever, the 
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demands for external assessment and internal assessment have fueled an increase in the 
demand for quality survey data (Porter, 2004). As such, to increase the acceptance and 
use of survey data, increased efforts are required to improve survey data quality in both 
higher education research and institutional research. 

Unfortunately, little literature is available that examines survey data quality issues in a 
holistic way for the purposes of institutional research professionals. There seems to be 
two gaps in the existing efforts to improve on survey methodology. First, while there is a 
large amount of literature addressing various aspects of a survey project (Croninger & 
Douglas, 2005; Gonyea, 2005; Porter, 2004; Presser, Couper, Lessler, Martin, Rothgeb, 

& Singer, 2004; Sanchez, 1992; Thomas, Heck, & Bauer 2005), there is lack of synthesis 
of this literature for the purposes of quality control. Furthermore, issues of survey 
methodology have yet to be approached from a survey quality perspective. Because 
survey quality is a complex issue with multiple dimensions, a lack of synthesis may be 
understandable — when one particular area is addressed in depth, it is hard to cover other 
areas and have the breadth of the topic. However, this synthesis of knowledge is 
necessary for any kind of survey project if the researcher is serious about the survey data 
quality, regardless of whether the research concerns multi-institutional surveys such as 
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the National Survey of Student Engagement (NSSE) or in-house surveys specific to one 
institution. 

The second gap is that while “serious” institutional researchers (Gonyea, Korkmaz, 
BrckaLorenz & Miller, 2010) are making efforts to improve survey data quality, they 
often do not rely on theories in survey methodology, such as total survey error. As a 
result, little can be found in the extant institutional research literature in reference to total 
survey error theory and its applications in institutional research. As such, for institutional 
research purposes, it is meaningful to look at survey data quality issues in light of the 
total survey error theory. 

This task is more pressing in the unbalanced efforts among institutional researchers to 
improve the quality of data they are working with. How institutional research provides 
information support to a higher education institution is illustrated in a book that carries 
much weight in the field of institutional research - People, Processes, and Managing 
Data (McLaughlin & Howard, 2004). The book presents an information support cycle 
that involves three stakeholders through five stages of information management: the 
custodian/supplier (focusing on integrity of the data), the broker/producer (transforming 
the data into information), and the manager/user (taking the information and applying it 
to the situation); the center of this infonnation support cycle is quality decision-making 
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(Figure 1). The institutional research function is identified as being mostly aligned to the 
information broker role while relating to the data custodian and the data user (p. 17). 
What merits attention is that the book only speaks to administrative data in an institution 
and does not address survey research projects. So the question remains unclear as to how 
the institutional research function fits into the information support cycle when survey 
data are involved. 

Figure 1 . Information Support Cycle in Institutional Research 

In this context, this paper presents a model of a survey data quality strategy for 
institutional researchers in higher education in light of total survey error theory. In the 
following section, I briefly introduce the quality perspective of a survey process and the 
major components of total survey error. Then I will present the proposed strategy for 
survey data quality for institutional research purposes in a table with explanations to 
follow. The paper ends with discussions about the implications of the strategy for 
institutional researchers. It should be noted that this paper focuses on the big picture and 
the breadth of survey quality data issues in institutional research, rather than the depth of 
each issue. Readers interested to know more about a particular topic are encouraged to 
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1. The “Quality” Perspective of Survey Process and the Total Survey Error 

A survey process can be viewed in two ways. First, it can be viewed from a process 
perspective, wherein one might examine all the steps and decisions that are required in a 
survey project, including: determining research objectives based on information needs, 
determining sampling methods, developing the survey instrument, administering the 
survey, conducting data analysis, and finally producing the survey report (Alreck & Settle, 
1985; Biemer & Lyberg, 2003). This approach describes the survey process as a 
procedure that is mostly sequential but with iterations. 

The second approach to a survey process is to view it from a quality perspective. This 
view does not focus on how best to implement each step in a survey process but 
concentrates on what problems may occur in each step and how to overcome or minimize 
the occurrence of those problems (Groves, Fowler, Couper, Lepkowski, Singer, & 
Tourangeau, 2004). In other words, it intends to examine errors that may occur in the 
survey process with a view of minimizing those errors threreby improving survey data 
quality. As such, the quality perspective is linked with survey errors and the concept of 
total survey error comes in. 

Total survey error is the difference between a population mean, total or other 
population parameter, and the estimate of the parameter based on the sample survey (or 
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census) (Biemer & Lyberg, 2003, p.36). Survey errors can be organized in different ways. 
Some categorize these into sampling error and non-sampling error (Biemer & Lyberg, 
2003); others classify them into observational error associated with measurement and 
non-observational error associated with representation (Groves et al., 2004). Regardless 
of different ways of organizing survey errors, the total survey error consists of the 
following five types of survey error: measurement error, coverage error, sampling error, 
non-response error, and post-survey or data processing error (Biemer & Lyberg, 2003; 
Groves et al., 2004; Weisberg, 2005). 

Each type of survey error creates the risk of variable errors and systematic errors, 
which can result in variance and bias respectively. Variance and bias are two measures of 
data quality; with variance being easier to measure and control than bias (Bailar, Herriot, 
& Passel, 1982; Czaja & Blair, 2005). In general, among the five types of survey error, 
the sampling error has a low risk of systematic errors whereas the other four types have a 
high risk of systematic errors (Biemer & Lyberg, 2003, p.59). 

In a typical institutional research survey project, the target population is generally 
either students or faculty/staff members of the institution, and the sampling frame is 
usually available from the student information system or the human resources database. 
The target population often uses emails and online resources, and therefore web surveys 
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are easier to administer than paper-based methods. Furthermore, the higher education 
institution usually has access to relatively advanced data entry and processing resources 
(e.g., software and research expertise), which may help reduce data processing errors. 
Given those characteristics, the risk of variable errors and the risk of systematic errors in 
each type of survey error may be different from those in a survey project in other contexts. 
Table 1 shows an indication of the risk of variable and systematic errors in the five types 
of survey error in an institutional research survey project. 

Table 1: Risk of Variable and Systematic Error by Major Error Source in the Institutional 
Research Context 

2. Survey Data Quality Strategy 

The survey data quality strategy proposed in this paper addresses the five identified 
types of survey error in a framework of quality assurance and quality control. 

There is a fine distinction between the two concepts: “Quality assurance ensures that 
processes are capable of delivering good products, while quality control ensures that the 
product actually is good” (Lyberg & Biemer, 2008, p. 426). As such, quality assurance is 
related to the survey process while quality control is related to the survey product. The 
characteristics of quality survey data need to be defined for the purposes of quality 
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control and the characteristics of a quality survey process should be identified for the 
purpose of quality assurance. 

With these considerations in mind, the proposed survey data quality strategy is 
comprised of three components: quality measures, which are the characteristics or 
indicators of quality survey data; quality control procedures, which are used to inspect 
the survey data and examine whether the data have those characteristics of quality survey 
data; and quality assurance procedures, which are used to inspect the survey process and 
check whether certain procedures were implemented during the survey process to ensure 
that the resulting survey data set will have those characteristics of quality survey data. 

Table 2 shows the survey data quality strategy with the five types of survey error as 
one dimension, and the three areas of quality inspection for the survey data as the other. 
The strategy is also aligned with the two approaches to addressing the issue of survey 
errors: measurement and reduction (Czaja & Blair, 2005). The quality control procedures 
are aimed to measure and assess survey errors and the quality assurance procedures are 
intended to reduce survey errors. 

Table 2. Summary of the Major Components in the Survey Data Quality Strategy 

In the following sections, I will identify the quality measures and elaborate on the 
quality control and quality assurance procedures for each of the five types of survey error 
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in the context of institutional research. Table 2 serves as a summary of those measures 
and their related procedures. 

2.1 Quality Inspection for Measurement Error 

Measurement error occurs when there are differences between the responses obtained 
and what was to be measured. Relating to measurement error, quality survey data are 
characterized by three indicators: validity, reliability and minimized response bias. 

Quality Control Procedures 

Validity is the extent to which the survey measure actually reflects the intended 
construct. Assessing validity is to inspect correlations. The validity of the survey can be 
determined by examining concurrent validity (measured by a positive correlation between 
the survey responses with responses to similar questions in other surveys) and divergent 
validity (measured by a negative correlation between the survey responses and answers to 
other questions that measure a different construct). When the measurement is consistent 
with the theory behind it, then the data have construct validity. Factor analysis is a 
method to inspect construct validity. 

Reliability is a measurement of the variability of answers over repeated conceptual 
trials. It addresses the question of whether respondents are consistent or stable in their 
answers, and it is therefore also known as response variance. Reliability is a function of 



13 




A Survey Data Quality Strategy 



correlation of two survey estimates, and reliability in the responses can be assessed in 
three ways: internal consistency (typically measured by Cronbach's alpha), split-half 
reliability and test-retest reliability (both typically measured by the Spearman-Brown 
coefficient). 

Response bias is a systematic deviation or discrepancy between the sample estimate 
and the true population parameter; in other words, the respondent mean response is 
consistently higher or lower than the true mean score of the population. Response bias 
can come from sources such as social desirability, acquiescence, yea- and nay-saying, 
prestige, threat, hostility, auspices, mental set, order and extremity (Alreck & Settle, 1985, 
P- H2). 

There are two ways to assess response bias. One is to compare survey data with data 
or information from sources external to the survey. An example of this method is to 
check with the stakeholder or the sponsor of the survey project to find out whether the 
survey findings about a particular question is far different from their experience or 
knowledge. Another way of assessment is to evaluate the occurrence of certain response 
tendencies, such as respondents giving socially acceptable answers, avoiding using the 
extreme response categories of a rating scale, or giving the same answer to all 
alternatives in a rating scale (known as strong satisficing) (Groves et ah, 2004). 
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Quality Assurance Procedures 

Measurement error can be reduced through the following techniques. First, 
researchers should construct a good questionnaire that is based on a well-supported 
theory or conceptual framework, and improve question wording and questionnaire 
structure. As an institutional research survey project is often initiated by certain 
institutional needs, the survey instrument tends to be constructed mainly from experience 
and less from a literature-based conceptual framework. However, despite the applied 
nature of an institutional research project, a literature review should not be compeltely 
absent from the survey design process. 

Second, cognitive interviews should be conducted to make sure that the target 
population understands the questions in the same way as the questionnaire was intended. 

Third, adequate respondent behavior involves optimal completion of the cognitive 
process and sufficient motivation on the part of the respondent; therefore, the 
questionnaire should be designed and administered with a view to ensuring participants 
actually go through the four components of the mental processing in answering survey 
questions: comprehension, retrieval, judgment and response (Tourangeau, Rips, & 
Raskinski, 2000). An example of failing to do so is strong satisficing, which may occur 
when respondents skip the retrieval and judgment steps and proceed to response steps 
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(Groves et al., 2004). Questions can be asked regarding how the respondents complete 
the questionnaire and how they self-evaluate their motivation and ability while 
completing the questionnaire. 

Fourth, for interviewer-administered surveys, procedures need to be taken to ensure 
adequate interview behavior, which is measured by low interviewer variance and can be 
best analyzed with multilevel analysis (Looseveldt, Carto, & Billiet, 2004). 

2.2 Quality Inspection for Coverage Error 

Coverage error occurs when differences between the sampling frame and the target 
population exist. Quality survey data are characterized by minimized discrepancy 
between the sampling frame and the target population. 

Quality Control Procedures 

Bias exists when some components in the target population are not available or 
accessible in the sampling frame. This may be caused by situations where elements in the 
target population do not, or cannot, appear in the sampling frame (i.e., undercoverage), 
units in the sampling frame are not in the target population (i.e., ineligible units), and 
several units in the frame are mapped onto the single element in the target population (i.e., 
duplication) (Groves et al., 2004). Therefore, researchers should check for those cases. 
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Another procedure for assessing the discrepancy between the sampling frame and the 
target population is to compare the specifications of the target population and the 
corresponding parameters of the sampling frame. As higher education institutions usually 
assign their students an institutional email address when they first enroll in their studies, 
the coverage error is not as big a threat to web surveys in institutional research as is 
found in other fields in general (Couper, 2000). 

Quality Assurance Procedures 

Researchers need to develop a working definition and clear specifications of the target 
population, and locate a readily available list that includes as many elements of target 
population as possible. In the context of institutional research, a typical target population 
is a student cohort with certain characteristics when they are applying for studies, are 
currently enrolled in certain programs of study, or have graduated within a certain time 
frame. As a post-secondary institution usually has a well-established student database, the 
sampling frame is often stable, complete and accessible. Therefore, the risk of variable 
errors and systematic errors related to coverage error are generally low and more 
controllable. 
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2.3 Quality Inspection for Sampling Error 

Sampling error is the difference between the survey estimate and the population 
parameter as a result of taking the sample instead of the entire population. A good set of 
survey data is representative of the sampling frame by known demographic parameters. 
When probability sampling is used, margin of error is a commonly used measurement of 
the level of random sampling error. The commonly acceptable margin of error is less than 
5% at the 95% confidence level. 

Quality Control Procedures 

Sample representativeness can be detennined by comparing the frequency 
distributions of the obtained sample and those of the sampling frame by certain 
demographic characteristics. If the difference in the frequency distributions is negligible, 
then the obtained sample is considered as representing the sampling frame by those 
indicators. 

Margin of error is influenced by variance of a variable and sample size: the smaller the 
variance is and the larger the sample size is, the smaller the margin of error becomes. 
There is a distinction between margin of error for a variable and margin of error for a 
survey: the margin of error for a survey uses the maximum variance (where the standard 
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deviation equals 0.5) (Groves et al., 2004). Margin of error for a survey can be calculated 
if the total of the target population and the total number of respondents are known. 

Quality Assurance Procedure. 

The magnitude of sampling error is more controllable than other types of survey 
errors and therefore is considered as intentional error (Biemer & Lyberg, 2003). The 
sampling error can be controlled by making a good sample selection, which is featured by 
random selection of elements in the sampling frame and key subgroups of the population 
being well-represented in the sample. Appropriate implementation of a sampling 
procedure requires four considerations: probability sampling, stratification, clustering, 
and sample size. Sampling bias can be easily removed by giving all elements an equal 
chance of selection; sampling variance is reduced when the sample size is big and the 
sample is stratified and is not clustered (Groves et al., 2004). 

An appropriate sampling strategy involves calculating a reasonable sample size. The 
sample size is determined by the sampling frame, expected margin of error, and 
anticipated response rate, data breakdown for analysis, and the resources available for the 
survey. Table 3 shows an example to illustrate this. 

Table 3. An Example of Determining the Sample Size 
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With a sampling frame of 10,000 students, if the expected number of respondents is 
300, then the margin of error is 5.57% at the 95% confidence level. If a response rate of 
20% is anticipated, then a sample of 1,500 students is needed for the survey. If the 300 
expected respondents are going to be broken down into subgroups for data analysis 
purposes, for example, according to the six schools or faculties of the institution, then in 
each subgroup, there are 50 respondents. This size of respondents is acceptable for 
descriptive statistical analysis. However, if the researcher intends to conduct inferential 
statistical analysis or multivariate data analysis, s/he may want to consider increasing the 
total expected number of respondents to 900, which results in a margin of error of 3. 12% 
and a sample size of 4,500 when the anticipated response rate remains 20%. If the survey 
project is going to be administered in the web mode, the sample size will not matter much 
because the bigger size will not add costs to the survey; however, when the mail survey is 
used, the increased costs from questionnaire distribution, and data entry and processing 
will be a consideration for the sample size to be determined. This process of computing 
can be facilitated by using an online sample size calculation tool (e.g., 
http://www.raosoft.com/samplesize.html) . 
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2.4 Quality Inspection for Non-response Error 

Non-response errors occur when some people from the sample who were invited to 
participate in the survey do not respond to the survey request or fail to respond to some of 
the questions in the survey. Therefore, there are two types of non-response error: non- 
responses at the unit level and non-responses at the question item level (Biemer & 

Lyberg, 2003; Groves et al., 2004; Weisberg, 2005). Non-response bias occurs when the 
statistics computed from respondent data differ systematically from those based on the 
entire sample data. 

Quality Control Procedures for Unit Non-responses 

The measurement of non-response bias is the function of the non-response rate and 
the difference between the respondent and non-respondent estimates (Biemer & Lyberg, 
2003). As the non-response rate is the proportion of eligible survey recipients who did 
not respond to the survey, it can be calculated from the response rate. As such, a quality 
survey data set is characterized by a reasonable response rate and insignificant difference 
between respondents and non-respondents with regard to the characteristics of interest to 
the survey. 

The response rate can be calculated in two ways: one is I/(I+R+NC+0-IN) (I: 
completed questionnaires; R: refusals; NC=non-contacts; O: others (e.g., those who 
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cannot understand due to language problems); IN: ineligible respondents); the other is 
simply I/(S-IN) (I: completed questionnaires; S: survey recipients; IN: ineligible 
respondents). 

Three methods can be used to assess the difference between respondents and non- 
respondents. First, assess the degree to which non-respondents will interact with the 
topics or issues of the survey. Usually, those who are highly involved with the topic of 
the survey are more likely to respond than those who are not and those who have neutral 
opinions about the topic or have less experience are more likely to discard the 
questionnaire (Alreck & Settle, 1985). For example, when an institution is conducting a 
survey targeting at its current students to find out how they have used its library services, 
it is important for the researcher to bear in mind that the obtained responses will over- 
represent the characteristics of those actual library users as those who have used the 
library services are more likely to respond to the survey. Therefore, it is erroneous to use 
the data to draw a conclusion about the library use pattern of all current students. Second, 
compare survey respondents’ demographic characteristics with those of the sampling 
frame and find out whether the respondents under-represent some sub-groups in the 
sampling frame and whether members of the under-represented groups tend to answer 
some of the substantive survey questions somewhat differently than others (Czaja & Blair, 
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2005). Third, examine the characteristics of the late respondents. Those who respond to 
the very last follow-ups may have similar characteristics to people who never respond, 
and hence, the responses from late respondents can be used to infer what the non- 
respondents’ would have answered (Suskie, 1996). 

Quality Control Procedures for Item Non-response 

Similar to the unit non-response error, item non-response error is a function of the 
item non-response rate and the difference between item respondents and non-respondents. 
Item non-response results in missing data. Hence, the characteristics of a quality survey 
data set at the question item level are two: a reasonable proportion of missing data for 
responses to each question, and an insignificant difference between respondents and non- 
respondents to each question item. 

Investigation into missing data and the difference between item respondents and non- 
respondents involves item non-response analysis. A relatively large proportion of 
missing data for question items should be reported as a red flag for further investigation. 
Item non-response analysis involves examining (a) whether non-response occurrence is 
related to certain demographic characteristics of the respondents, or in other words, 
whether a particular group of respondents tends to fall short of substantial responses than 
others, and (b) whether the non-responses to various question items are related. 
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Item non-response analysis can be conducted in three ways: (a) calculate the 
proportion of missing data for each question; (b) decide the characteristics of missing 
data: missing completely at random, missing at random, or missing not at random 
(Allison, 2002); (c) investigate the variables with a large proportion of missing data. 
Quality Assurance Procedures for Non-responses 

Quality assurance procedures are derived from three types of unit non-response: non- 
contacts (failure to reach the survey recipients), refusals (recipients’ decline of the survey 
request), and recipients’ inability to participate (Groves et al., 2004). The third situation 
also applies to item non-response. Survey non-response has been increasing and much of 
the non-response is due to rising rates of refusals. A particular problem for institutional 
research survey projects is multiple surveys of students and the resulting survey fatigue 
(Porter, Whitcomb, & Weitzer, 2004). 

The reasons for non-responses can be social (e.g., survey fatigue) or demographically 
related (e.g., male students may be less likely to respond to a survey request), and can be 
related to questionnaire design and survey administration. The factors related to survey 
design are more controllable than social or personal factors. 

Various techniques can be used to reduce the three types of non-response from survey 
design points of view (Czaja & Blair, 2005; Groves et al., 2004; Porter ,2004). Examples 
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for reducing non-contacts are to obtain accurate contact information of the sampled 
recipients, and avoiding having one’s invitation to the survey marked as spam when using 
web surveys. Effective methods to combat refusals are notification about the survey prior 
to its launch, courteous initial contact messages (letter or email), the manner of 
requesting participation (e.g., the tone of the request, the signature, salience, and ensured 
confidentiality), a reasonable number of reminders, and appropriate timing of data 
collection, and the use of incentives. To help with the ability to participate, the survey 
instrument should be in reasonable length, be easy to read, ask for relevant, available, or 
accessible information rather than inestimable infonnation (Groves et al., 2004). 

2.5 Quality Inspection for Post-survey Error. 

Post-survey error refers to all the errors that occur in the data processing process after 
the survey data are collected. Reliable findings and valid conclusions rely on correct data 
processing procedures for both individual data and aggregate data. Literature on error 
control is small relative to other types of survey errors and data processing is considered 
as a neglected error source in survey research (Biemer & Lyberg, 2003). 

Quality Assurance Procedures 

Various data processing activities after data collection can be organized into three 
types of procedure: data cleaning, data adjustment, and data analysis. Data cleaning 
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involves inspecting accuracy of data entry and checking for outliers. Data adjustment 
involves considerations of using weights, handling missing data, or creating composite 
variables. Data analysis involves inspecting assumptions, choosing appropriate statistical 
techniques, and conducting statistical computing. When open-ended questions are used, 
coding is required. Inspecting weaknesses in coding structure and coder variance are two 
quality control procedures for coding (Groves et al., 2004). While numerous resources on 
statistics and survey methodology are available to address how to correctly and 
appropriately implement each of these procedures, quality control seems to rely, to a 
large extent, on the expertise of individual researchers. 

3. Implications 

The survey data quality strategy presented in this paper has two practical implications 
for institutional researchers. 

First, survey data quality requires the use of multiple indicators. Survey data quality is 
multi-dimensional. This means that relying on one single indicator to evaluate survey 
data is a misleading practice. An example for this is the myth about the response rate. 
Institutional researchers sometimes hear such comments from information users as “With 
such a low response rate, the survey results are problematic” and “The response rate of 
the survey is high so the sample represents the population well”. 
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The survey data quality strategy (see Table 2) suggests that a reasonably high response 
rate is simply one of the indicators for quality data though a very important one. In other 
words, difference between responses and non-responses needs to be considered when the 
researcher assesses the unit non-response error; and quality indictors derived from other 
types of survey error (e.g., sample representativeness) need to be included in the 
evaluation. Also, the response rate does not speak to the representativeness of survey 
responses, which is a separate indicator for non-response bias. Therefore, a relatively 
high response rate reduces the risk of non-response bias; however, this single indicator 
does not necessarily lead to the conclusion that the survey data have low non-response 
bias if the non-respondents are very distinctive on the survey variable (Groves et al., 
2004). As such, the strategy helps debunk some myths about survey data quality and 
encourages researchers to examine other quality indicators in addition to the response rate. 

Secondly, it is important to document sur\>ey data quality. The survey data quality 
strategy lends more importance to quality documentation. The strategy proceeds from 
types of survey error based on the theory of total survey error, and consists of quality 
measures for each type of survey error, and quality control and quality assurance 
procedures that address each of the quality measures. What is contained in Table 2 is, as a 
matter of fact, a variety of ways for deriving evidence for survey data quality and the 
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procedures for collecting the evidence. Therefore, in the final analysis, the task of an 
institutional researcher is to collect evidence regarding various characteristics of the 
obtained survey data to convince information users that the survey data that have been 
collected are good for certain conclusions. The more evidence is identified and presented, 
the greater trust will be gained from the information users. This evidence collection 
process requires documentation. 

The infonnation about the identified evidence relating to survey data quality is called 
“metadata” (i.e., data about data). Four types of metadata can be used to document survey 
data quality (Groves et al., 2004): definitional (investigated constructs, target population, 
sampling frame, coding tenninology), procedural (data collection procedures), 
operational (data cleaning, data adjustment and data analysis procedures), and systems 
(data set format, file location, retrieval protocol, codebook). 

In an institutional research project report, a detailed description of the methodology 
employed in the study usually is provided in an appendix (Bers & Seybert, 1999). The 
appendix is a good place in a survey report to document the evidence for survey data 
quality. A psychometric portfolio is also recommended for reporting evidence (Gonyea et 
al., 2010). 
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The purpose of documentation is to communicate the characteristics of the survey data 
set and the procedures by which to obtain quality indicators, if there is any, so as to build 
and enhance trust among information users in survey findings and help them interpret the 
findings in an appropriate way. On the basis of the elements in the survey data quality 
strategy in Table 2, 1 have developed a checklist (see the Appendix) for institutional 
researchers to facilitate their documentation efforts. 

4. Concluding Thoughts 

This paper presents a survey data quality strategy in light of the theory of total survey 
error for the purpose of institutional research that is conducted in higher education 
institutions. The strategy consists of indicators of quality data (quality measures), and 
procedures to inspect the survey data and the survey process (quality control and quality 
assurance procedures). The strategy is summarized in Table 2 and elaborated in Section 2 
of this paper, with an attached checklist for institutional researchers. 

Here are two final thoughts about survey data quality issues. 

The first thought is related to how “survey data quality” fits in “survey quality”. 
Survey quality is recognized as a three-level concept (Lyberg & Biemer, 2008). The three 
levels are product quality (i.e., “the set of product characteristics ideally established with 
the main users”), process quality (i.e., “a well-designed and tightly-controlled process”), 
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and organizational quality (i.e., “reliable organizational characteristics to ensure that the 
organization is capable to develop dependable processes that can deliver quality 
products”) (p. 428). The three levels are interdependent: organizational quality is required 
for process quality and process quality is required for product quality. All the three levels 
contribute to quality decision-making. 

Survey data quality is actually part of the product quality of a survey and “is achieved 
through process quality” (Biemer & Lyberg, 2003, p. 24). The survey data quality 
strategy mainly addresses two out of the three levels of survey quality: the product 
quality and the process quality. Organization quality is more concerned with 
organizational culture and information management, and involves information 
infrastructure for quality survey data. It is not being addressed in the strategy presented in 
this paper. 

The second thought is related to how survey data quality stands in the information 
support cycle in institutional research. When the survey data quality strategy proposed in 
this paper fits into the infonnation support cycle (McLaughlin & Howard, 2004) (see 
Figure 1), the institutional researcher actually takes the bulk of responsibilities in this 
cycle - the responsibilities of both the Custodian and the Broker, and implements the 
larger proportion of the infonnation support cycle - from identifying concepts to 
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delivering a report, while working with the Manager. In contrast, when administrative 
data are used, the institutional researcher usually does not get involved as much in the 
stages of collecting and storing data. Therefore, the role of the researcher is of greater 
importance in the infonnation support cycle when a survey project is conducted. This 
may provide another reason for further investigation into survey data quality issues and 
making efforts to improve survey data quality to better fulfill the information support 
function of institutional research. 

In the context where attention to details and quality control is recognized as a personal 
and professional dimension of effectiveness in institution research (Knight, 2010), it is 
hoped that this paper has contributed to filling in a gap in literature of survey quality 
control for institutional research purposes. 
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Figure 1 . Information Support Cycle in Institutional Research 




Table 1 . Risk of Variable and Systematic Error by Major Error Source in the Institutional 
Research Context 



Types of Survey Error 


Risk of Variable Error 


Risk of Systematic Error 


Measurement error 


High 


High 


Coverage error 


Low 


Low 


Sampling error 


High 


Low 


Non-response error 


Low 


High 


Post-survey/Data processing error 


High 


Low 
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Table 2. Summary of the Major Components in the Survey Data Quality Strategy 



Types of 
error 


Quality Measures 
(Indicators of Quality 
Data) 


Quality Control Procedures 
(inspecting the survey data) 


Quality Assurance Procedures 
(inspecting the survey process) 


Measurement 

error 


1) . Validity 

2) . Reliability; 

3) . Minimized response 
bias 


1) . Assess validity by 

checking construct validity (e.g., factor analysis), 
concurrent validity, and divergent validity 

2) . Assess reliability by checking internal 
consistency, split-half reliability, test-retest 
reliability 

3) . Assess response bias by 

o comparing survey data with data or 
information from sources external to the 
survey; 

o checking response tendencies 


Construct a good questionnaire; 

Conduct cognitive interviews; 

Ensure adequate respondent behavior; 

Ensure adequate interviewer behavior 
(if interviewer-administered) 


Coverage 

error 


Minimized discrepancy 
between the sampling 
frame and the target 
population 


Check whether there exist undercoverage, ineligible 
units, or duplications in the sampling frame; 

Compare the specifications of the target population 
and the corresponding parameters of the sampling 
frame 


Develop a working definition and 
clearly defined specifications of the 
target population; 

Locate a readily available list that 
includes as many elements of the 
target population as possible 


Sampling 

error 


1) . Sample 
representativeness 

2) . Reasonable margin of 
error 


1) . Compare the distributions of obtained sample 
with those of the sampling frame by certain 
demographic characteristics; 

2) . Make sure the margin of error is below 5% for 
expected number of respondents at the 95% 
confidence level 


Appropriate implementation of 
sampling procedure; 

Calculate a reasonable sample size 
based on the size of the sampling 
frame, the expected margin of error, 
data breakdown for analysis, the 
anticipated response rate, and the 
resources available for the survey 
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Table 2. (Cont’d) 



Types of 
error 


Quality Measures 


Quality Control Procedures 


Quality Assurance Procedures 


Non-response 

error 


At the unit level: 

1) . A reasonable response 
rate; 

2) . Insignificant difference 
between respondents and 
non-respondents 


1) . Calculate the response rate: 

Response ratc= I/(I+R+NC+0-IN) or I/(S-IN) 

2) . Assess the difference between the respondents 
and the non-respondents by: 

o Assess the level of interactions of non- 
response with the topics; 

o Compare survey respondents’ demographic 
characteristics with those of the sampling 
frame; 

o Examine the characteristics of the late 
respondents. 


Use techniques in the process of 
questionnaire design and survey 
administration to combat: 
o Non-contacts: 
o Refusal: 

o Inability to participate: 




At the item level: 

1) . A reasonable proportion 
of missing data in 
responses to each question 

2) . Insignificant difference 
between respondents and 
non-respondents to each 
question 


Do item non-response analysis by 

1) . Calculating the proportion of missing data for 
each question 

2) . Deciding whether the data are missing 
completely at random 

3) . Investigating the variables with a large 
proportion of missing data 




Post-survey 

error 


Correctness in processing 
individual cases and 
aggregate data 


Rely more on expertise of individual researchers 


Carefully perform the procedures of 
o Data cleaning 
o Data adjustment 
o Data analysis 
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Table 3. An Example of Determining the Sample Size 



Expected 

number of Margin of Error 

respondents (95% confidence level) 


Anticipated 
Response Rate 


Sample 

Size 


300 5.57% 


20% 


1500 


400 4.80% 


20% 


2000 


500 4.27% 


20% 


2500 


600 3.88% 


20% 


3000 


700 3.57% 


20% 


3500 


800 3.32% 


20% 


4000 


900 3.12% 


20% 


4500 



Note: The sampling frame is 10,000 students. 
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Appendix: A Checklist for Inspecting Survey Data Quality 

Questions to detect measurement error: 

□ Is there any evidence that shows the reliability of the survey instrument? 

□ Is there any evidence that shows the validity of the inference? 

□ Is there any pattern or tendency in the responses? Are there any cases where the same answer 
was given to all the question items? 

Questions to detect coverage error: 

□ Were the specifications of the target population clearly defined? Did the parameters of the 
sampling frame correspond with the specifications of the target population? 

□ Were there any undercoverage, ineligible units or duplications in the sampling frame? 

Questions to detect sampling error: 

□ Was the sample size reasonable? (How many respondents did you expect to get? What 
margin of error was expected? What response rate was anticipated?) 

□ What sampling method was used? Was the method appropriate? Did you give equal chance 
of selection? For which subgroups did you give unequal chance of selection (if using 
stratification)? 

□ With the number of the respondents and the total target population, what was the margin of 
error? Was the obtained margin of error reasonable? 

Questions to detect non-response error: 

□ Was the response rate reasonable? 

□ Considering the topic of the survey, who may have been more likely to respond to the survey 
and who may have been less likely to respond? 

□ How did the profile of the respondents compare with that of the target population (or 
sampling frame) by certain demographic characteristics (e.g., campus, program type, credential, 
gender, age, etc.)? 

□ How were key subgroups represented in the respondents? 

□ In what questions did you have a relatively large proportion of missing data? Why was it? 
Did you report the missing data? 

Questions to detect post-survey error: 

□ Did you treat data with non-response bias as representing the characteristics of the total 
population? 

□ How were the data cleaned? Was the procedure appropriate? 

□ How were the data coded? Was the procedure appropriate? 

□ How were the data weighted? (if applicable) Was the method correct? 

□ How were the data imputed? (if applicable) Was the method appropriate? 

□ What statistical techniques were used? Had the assumptions been inspected? Were the 
statistical procedures properly implemented? 
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